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Abstract We revisit our recent study [Predicting results of the Research Excellence 
Framework using departmental h-index, Scientometrics, 2014, 1-16; arXiv:1411.1996 
in which we attempted to predict outcomes of the UK’s Research Excellence 
Framework (REF 2014) using the so-called departmental h-index. Here we report 
that our predictions failed to anticipate with any accuracy either overall REF 
outcomes or movements of individual institutions in the rankings relative to their 
positions in the previous Research Assessment Exercise (RAE 2008). 

Keywords peer review • Hirsch index • Research Assessment Exercise (RAE) • 
Research Excellence Framework (REF) 


1 Introduction 

The results of the last national exercise in research assessment in the UK - the 
Research Excellence Framework (REF) - became available at the end of December 
2014. As with its precursor — the Research Assessment Exercise (RAE) — a vast 
amount of discussion and debate surrounds the process, especially concerning the 
merits of such peer-review based exercise themselves, but also about whether or 
not they could sensibly be replaced or supplemented by the usage of quantitative 
metrics. 
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In this context, and before the results of the REF were annonnced, we at¬ 
tempted to use departmental h-indices to predict REF results [1]. In particnlar, 
after demonstrating that the /i-index is better correlated with the results from 
RAE 2008 than a competing metric, we determined h values for the period ex¬ 
amined in the REF. We looked at four subject areas: biology, chemistry, physics 
and sociology and placed our predictions in the public domain, including in this 
journal, promising to revisit the paper after REF results were announced [1]. 

Here we fulhll that promise and compare h-predictions with the outcomes of 
the REF: We report that our predictions were wildly inaccurate. 

Our previous paper drew considerable interest in the media and on the blo- 
gosphere. Anticipating a similar degree of interest in the results of our analysis 
presented here, we also reflect on its implications. 


2 Predicting the REF 

The results of both RAE 2008 and REF 2014 are disseminated as quality prohles 
which partition submitted research of higher education institutes (HEI’s) into hve 
bands, decreasing in quality from world-leading to unclassified. To capture this 
gradation in a single summary statistic, in [1] we used a post-RAE funding for¬ 
mula devised by the Higher Education Funding Council for England and denoted 
by s. We observed discipline dependent correlations between s values from the 
RAE 2008 and departmental h-indices: The correlation coefficients varied between 
0.55 and 0.8. Such results are not good enough to consider replacing RAE/REF by 
citation-based measures since even small differentials in rating can have consider¬ 
able consequences for HEI’s in terms of reputation and the state funding received. 
Nevertheless, we considered it interesting to check the extent to which the re¬ 
sults from REF 2014 would correlate with departmental h-indices and whether 
predictions could be made. 

Following the notation of [1] , departmental Hirsch indices based on the RAE 2008 
assessment period from 2001 to 2007 are denoted h 2 oo 8 - Those for papers published 
between 2008 and 2013 (the REF 2014 assessment period) are denoted /12014 here. 
We are interested in two types of prediction, each measured by correlation coeffi¬ 
cients. Firstly, a “global” picture, representing comparisons between s-values de¬ 
livered by the REF and /i-values delivered by Hirsh indices, is gauged by Pearson’s 
coefficient r and Spearman’s rank correlation coefficients p. Individual universities 
are also interested in a more local picture - whether they move up or down in 
the REF rankings relative to HEI’s competing in a particular subject area. It is 
not unreasonable to assume that a shift upwards or downwards in the Hirsch- 
index rankings would be accompanied by movement in the same direction in the 
RAE/REF rankings. In this manner, one may seek to predict, not the exact posi¬ 
tions of various institutions in the rankings, but relative directions of movement. 
These are also measured by correlation coefficients. 

Therefore, we seek to address two questions: what is the correlation between 
REF 2014 scores and the corresponding departmental /i-indices and is it possible 
to predict the tendencies of changes in the rankings of submitting institutes. 
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3 Results 

Before delivering the results of our analysis, we comment that the list of submitters 
to REF 2014 is different to that of RAE 2008. Moreover, due to technical reasons 
it was not possible to obtain the citation data, and therefore to calculate h-indices, 
for a small number of institutions (those that were not listed in the Scopus database 
used after refining the search results). We have to limit our analysis to those HEI’s 
for which RAE, REF and h-index scores are available. This reduction in data-set 
size can affect correlation coefficients. For example, the correlation coefficients 
r « 0.74 and p « 0.78 between RAE 2008 and / 12008 , published previously for 
Biology (see [1], table 1), were calculated for the 39 groups which submitted to 
RAE 2008 and for which the Scopus data were available. But if we drop the 8 
HEFs which did not submit in this unit of assessment in REF 2014, the resulting 
correlation coefficients values change to r « 0.55 and p « 0.61, respectively. This 
caveat notwithstanding we compare the ranked lists of HEFs for which all four 
scores are available. These comprise 31 Biology groups, 29 Chemistry groups, 32 
Physics groups and 25 Sociology groups. 

We also remind that both the RAE and the REF had three components, one 
of which involved outputs (publications) only. (The other components, which con¬ 
tributed less than outputs to overall quality profiles, were research environment 
and esteem or non-academic impact.) Since the Hirsch index is a function of ci¬ 
tations to publications, we compare both to overall RAE/REF scores and to the 
scores coming from outputs only. 

The calculated correlation coefficients are given in table [1] In the table we 
present separately the results for overall s values (upper part) and for s-values 
corresponding to outputs only (lower part). Comparing the values presented in 
the columns 2-5, one can see that the RAE 2014 scores are not much better 
correlated with departmental /i-indices than RAE 2008. The correlation coefficients 
are positive but still not strong enough to make accurate predictions or to replace 
REF with metrics. As already found in [2], the output component of REF is 
more weakly correlated with the citations-based metric for Biology, Chemistry 
and Physics. 

The last column in the table [1] indicates the correlations between predicted 
and actual directions of shift (up or down) in the ranked lists based on different 
measures. The correlations are weakly positive or even negative. This approach, 
however, does not take into account different magnitudes of h-index shifts. We 
surmised that there may be a threshold such that only /i-index changes greater 
than a critical value tend to manifest changes in the same direction in the s-ranks. 
However no such threshold was found at least using the limited data available. 
These mean that it is not possible to use the departmental h-index in this manner 
to predict whether a given HEI will move up or down in the REF rankings relative 
to other HEFs. 


4 Conclusions and Discussion 

Here we present conclusions coming from the two parts of this study [1] . Given the 
broad levels of interest in the hrst part [1] , we feel an extended discussion on the 
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role of metrics in national research assessment exercises of the types considered 
here is warranted. 

It is well docnmented that the REF itself is flawed (lack of robust calibration 
process, insufficient scrutiny of each and every sub-discipline, inevitable human 
error and bias, lack of robust normalisation between disciplines, etc.) and Good- 
hart’s law informs us that when measures becomes targets, they cease to be good 
measures. Despite these shortcomings, peer review is currently widely considered 
to be the most acceptable way to evaluate research quality. It is the only national 
process in current usage in the UK is the REF, so any replacement would presum¬ 
ably have to be able to mimic it accurately to be accepted by policy makers and 
the academic community. 

We have investigated whether departmental /i-indices could play such a role. 
We found that the correlations between departmental h-scores and REF 2014 
results are more or less the same as those between the former and RAE 2008 [1]. 
Although they sometimes correlate well, the similarities are not good enough to 
make accurate predictions; h-indices used in this way do not track the peer review 
exercises well enough for them to form an component of, or substitute for those 
exercises. Additionally, we found very poor correlations between the predicted and 
actual changes in the ratings. This means that the departmental h-index does not 
offer a way to foretell the direction of changes of universities in the rankings in 
these subject areas. 

It is worthwhile taking a step back to review what we are attempting to do with 
scientometrics in the context of national assessment exercises. Academic research is 
a special kind of activity, often founded purely on curiosity. Although applications 
may not be obvious in the short term, curiosity-driven research has led to some 
of the most important practical advances our civilisation has produced, including 
discoveries in medicine and technology. Some of these advances have arisen decades 


Table 1 The values of Pearson coefficients r and Spearman rank correlation coefficients p, 
calculated for different disciplines for different pairs of measures. The numbers of HEFs which 
were taken into account to calculate the corresponding pair of coefficients (Pearson and Spear¬ 
man) are given in parentheses. All values, except those in boldface, are statistically significant 
at the level a = 0.05. The upper part uses s values from the overall RAE and REF results 
(•^RAE and SRBF, respectively) while the lower part corresponds to the results for outputs only. 
Correlations between predicted and actual directions of shift (up or down) in the ranked lists 
based are given in the final columns. 


OVERALL 

SRAB vs. /12008 

SREF vs. 112014 

n 


r 

P 

r 

P 

r 

Biology (31) 

0.55 

0.61 

0.58 

0.63 

-0.15 

Chemistry (29) 

0.80 

0.83 

0.84 

0.89 

0.05 

Physics (32) 

0.49 

0.55 

0.55 

0.50 

0.26 

Sociology (25) 

0.50 

0.39 

0.59 

0.60 

0.18 


OUTPUTS ONLY 

«RAB VS. h200S 

«REF VS. h2014 

n 


r 

P 

r 

P 

r 

Biology (31) 

0.44 

0.51 

0.40 

0.42 

-0.33 

Chemistry (29) 

0.74 

0.71 

0.71 

0.72 

0.20 

Physics (32) 

0.44 

0.51 

0.39 

0.36 

0.02 

Sociology (25) 

0.41 

0.29 

0.71 

0.68 

0.06 
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after the scientific discoveries which underlie them. Since commercial exploitability 
may be impossible to predict in a reasonable time frame or entirely absent from 
blue-skies research, curiosity-driven research is mostly carried out at universities 
and is funded by the public purse. The REF, and its precursor, the RAE, are 
intended to monitor this public investment in the UK. Other countries have other 
schemes, many involving the use of metrics. The pertinent question is whether 
these are fit for purpose. 

Belief that metrics in current use are counter-productive has led to a recent 
groundswell of opinion against them - see, e.g., the San Francisco Declaration on 
Research Assessment [3]. In France the CNRS has questioned the use of biblio- 
metric indicators — including the /i-index — in the evaluation of scientihc re¬ 
search [5]. One argument is that, in the increasingly managed environments of 
many universities around the world, where academic freedom has already lost 
ground to semi-industrialised processes, the introduction of metrics would further 
undermine environments for basic research. In seeking to maximise metric scores, 
fashionable and incremental research may be promoted over foundational scientific 
inquiry. This is potentially devastating to an endeavour which is at the very heart 
of what it is to be human and a foundation of our society - curiosity itself. 

So is there a place for metrics in future national assessment exercises? As for 
any other blunt tool, quantitative metrics can be useful if used in the correct man¬ 
ner, by informed subject experts. But in the wrong hands, they can be dangerous. 
Our study shows that a very different landscape would have emerged in the UK 
if REF 2014 had been entirely and simplistically based on the automated depart¬ 
mental /i-index. A wise academic subject expert can, perhaps, use such a metric to 
gain perspective in combination with other approaches, taking into account many 
nuances such as scientihc context, subject history and history of science generally, 
technical aspects, future perspectives, interdisciplinarity and so on. Clearly, how¬ 
ever, over-reliance on a single metric by persons who are not subject experts could 
be misleading, especially in increasingly managed landscapes in which academic 
traditions are diminished or eroded. 
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